A Model-Agnostic Verification Framework for Hallucination Risk Quantification in Large Language Models

Authors: Surajit Tunga, Tripti Pramanik, Debmalya Pal, Sourajit Dasgupta, Sharmistha Das, Suparna Pal, Sritama Pal, Nabaneeta Banerjee

DOI Link: https://doi.org/10.22214/ijraset.2026.83796

Certificate: View Certificate

Abstract

Hallucination causes critical safety challenges in Large Language Models (LLMs), particularly in high-stakes domains where faculty reliability and logical consistency are essential. Retrieval-augmented and reasoning-enhanced architectures tried to reduce hallucination. However a systematic, model-agnostic framework for quantifying hallucination is insufficiently explored.In this work, we are proposing a Model-Agnostic Verification Framework (MAVF) that operates as an external safety layer over LLM based systems. The framework introduces a formal hallucination risk function integrating three complementary dimensions: semantic evidence alignment between generated outputs and retrieved context, logical consistency generation, the proposed approach enables continuous and interpretable risk quantification independent of underlying model architecture.

Introduction

Large Language Models (LLMs) have achieved significant success in natural language processing, but they continue to suffer from hallucinations—the generation of plausible yet factually incorrect or unsupported information. Although techniques such as Retrieval-Augmented Generation (RAG) improve factual grounding by incorporating external knowledge, hallucinations can still occur due to reasoning errors, overreliance on internal model knowledge, or misinterpretation of retrieved evidence. Existing approaches mainly focus on reducing hallucinations through retrieval improvements, detection methods, or architectural changes, but they often treat hallucination as a binary problem rather than a measurable reliability risk.

To address this limitation, the study proposes a Model-Agnostic Verification Framework (MAVF) that reconceptualizes hallucination as a continuous and quantifiable safety risk. MAVF functions as an external verification layer that can be applied to any LLM without modifying its architecture. The framework evaluates generated responses using three key dimensions:

Semantic Evidence Alignment (SEA): Measures how well the response is supported by retrieved evidence using semantic similarity or entailment techniques.
Logical Consistency (LC): Assesses whether the response is internally coherent and free from contradictions.
Confidence Calibration (CC): Evaluates whether the model’s confidence aligns with its actual likelihood of being correct.

Each component produces a normalized score between 0 and 1. These scores are combined using a non-linear multiplicative reliability function, which generates a continuous hallucination risk score. The hallucination risk is defined as the inverse of the overall reliability score, ensuring that weaknesses in any one dimension significantly increase the final risk estimate.

The proposed framework offers several advantages:

It is architecture-independent and compatible with different LLM systems.
It provides an interpretable and continuous reliability score instead of simple binary classification.
It captures the interaction between factual grounding, reasoning quality, and confidence.
It supports risk-aware decision-making, particularly in safety-critical applications.

Experimental validation compares the multiplicative risk model with linear and semantic-only approaches. Results show that MAVF is more sensitive to compounded reliability failures, assigns higher risk to responses with multiple weaknesses, remains stable under small perturbations, and provides clearer separation between reliable and unreliable outputs.

Conclusion

This work is about a Model-Agnostic Verification Framework (MAVF) for risk quantification in Large Language Models.We formalized hallucination as a continuous reliability risk variable and proposed a multiplicative risk function integrating semantic evidence alignment, logical consistency, and calibrated confidence.The suggested formulation was meticulous examined and shown to appease key mathematical properties,involving boundness, monotonicity, interaction sensitivity, and Lipschitz continuity.The multiplicative model is more effectively captures joint reliability dependence and penalizes compounded weaknesses in grounding, reasoning, and confidence estimation, compared to additive aggregation strategies.Controlled validation shows that the structureprovides interpretable and stability-aware risk evaluation appropriate for safety-critical applications. By decoupling verification from generation, the suggested perspective allows architecture-independent deployment over different LLMsystems.Future work will expand the model to large-scale practical benchmarking, adjusting weight optimization, and domain-specific calibration strategies. The proposed risk modelling outlook offers a principled foundation for credibility-conscious and confidence-based deployment of Large Language Models.

References

[1] Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., ... & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38. [2] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, 9459-9474. [3] Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020, July). On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 1906-1919). [4] Shuster, K., Poff, S., Chen, M., Kiela, D., & Weston, J. (2021, November). Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 3784-3803). [5] Bowman, S., Angeli, G., Potts, C., & Manning, C. D. (2015, September). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 632-642). [6] Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017, July). On calibration of modern neural networks. In International conference on machine learning (pp. 1321-1330). PMLR. [7] Dziri, N., Lu, X., Sclar, M., Li, X. L., Jiang, L., Lin, B. Y., ... & Choi, Y. (2023). Faith and fate: Limits of transformers on compositionality. Advances in neural information processing systems, 36, 70293-70332. [8] Ribeiro, M. T., Singh, S., &Guestrin, C. (2016, August). \" Why should i trust you?\" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). [9] Izacard, G., & Grave, E. (2021, April). Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th conference of the european chapter of the association for computational linguistics: main volume (pp. 874-880). [10] Welleck, S., Kulikov, I., Roller, S., Dinan, E., Cho, K., & Weston, J. (2019). Neural text generation with unlikelihood training. arXiv preprint arXiv:1908.04319. [11] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., ... & Lample, G. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971. [12] Reimers, N., &Gurevych, I. (2019, November). Sentence-bert: Sentence embeddings using siamesebert-networks. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 3982-3992). [13] Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018, June). FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 809-819). [14] Kull, M., Perello Nieto, M., Kängsepp, M., Silva Filho, T., Song, H., & Flach, P. (2019). Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration. Advances in neural information processing systems, 32. [15] Gal, Y., & Ghahramani, Z. (2016, June). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050-1059). PMLR. [16] Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15). Berlin, Heidelberg: Springer Berlin Heidelberg.

Copyright

Copyright © 2026 Surajit Tunga, Tripti Pramanik, Debmalya Pal, Sourajit Dasgupta, Sharmistha Das, Suparna Pal, Sritama Pal, Nabaneeta Banerjee. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET83796

Publish Date : 2026-06-18

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here